{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "552cacc9-3791-47a7-bacb-0768a1799d5a",
      "metadata": {
        "id": "2eec5cc39a59"
      },
      "outputs": [],
      "source": [
        "# Copyright 2024 Google LLC\n",
        "#\n",
        "# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
        "# you may not use this file except in compliance with the License.\n",
        "# You may obtain a copy of the License at\n",
        "#\n",
        "#     https://www.apache.org/licenses/LICENSE-2.0\n",
        "#\n",
        "# Unless required by applicable law or agreed to in writing, software\n",
        "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
        "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
        "# See the License for the specific language governing permissions and\n",
        "# limitations under the License."
      ]
    },
    {
      "cell_type": "markdown",
      "id": "6d5d6512-5032-4709-9cc9-ec94095189ef",
      "metadata": {
        "id": "42004fb8ef27"
      },
      "source": [
        "# Q&A Chatbot with Vertex AI Search for summarized website results without advanced indexing\n",
        "\n",
        "<table align=\"left\">\n",
        "  <td style=\"text-align: center\">\n",
        "    <a href=\"https://colab.research.google.com/github/GoogleCloudPlatform/generative-ai/blob/main/search/vertexai-search-options/vertex_ai_search_website_summary.ipynb\">\n",
        "      <img width=\"32px\" src=\"https://www.gstatic.com/pantheon/images/bigquery/welcome_page/colab-logo.svg\" alt=\"Google Colaboratory logo\"><br> Open in Colab\n",
        "    </a>\n",
        "  </td>\n",
        "  <td style=\"text-align: center\">\n",
        "    <a href=\"https://console.cloud.google.com/vertex-ai/colab/import/https:%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fgenerative-ai%2Fmain%2Fsearch%2Fvertexai-search-options%2Fvertex_ai_search_website_summary.ipynb\">\n",
        "      <img width=\"32px\" src=\"https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN\" alt=\"Google Cloud Colab Enterprise logo\"><br> Open in Colab Enterprise\n",
        "    </a>\n",
        "  </td>\n",
        "  <td style=\"text-align: center\">\n",
        "    <a href=\"https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/generative-ai/main/search/vertexai-search-options/vertex_ai_search_website_summary.ipynb\">\n",
        "      <img src=\"https://www.gstatic.com/images/branding/gcpiconscolors/vertexai/v1/32px.svg\" alt=\"Vertex AI logo\"><br> Open in Vertex AI Workbench\n",
        "    </a>\n",
        "  </td>\n",
        "  <td style=\"text-align: center\">\n",
        "    <a href=\"https://github.com/GoogleCloudPlatform/generative-ai/blob/main/search/vertexai-search-options/vertex_ai_search_website_summary.ipynb\">\n",
        "      <img width=\"32px\" src=\"https://raw.githubusercontent.com/primer/octicons/refs/heads/main/icons/mark-github-24.svg\" alt=\"GitHub logo\"><br> View on GitHub\n",
        "    </a>\n",
        "  </td>\n",
        "</table>\n",
        "\n",
        "<div style=\"clear: both;\"></div>\n",
        "\n",
        "<b>Share to:</b>\n",
        "\n",
        "<a href=\"https://www.linkedin.com/sharing/share-offsite/?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/search/vertexai-search-options/vertex_ai_search_website_summary.ipynb\" target=\"_blank\">\n",
        "  <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/8/81/LinkedIn_icon.svg\" alt=\"LinkedIn logo\">\n",
        "</a>\n",
        "\n",
        "<a href=\"https://bsky.app/intent/compose?text=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/search/vertexai-search-options/vertex_ai_search_website_summary.ipynb\" target=\"_blank\">\n",
        "  <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/7/7a/Bluesky_Logo.svg\" alt=\"Bluesky logo\">\n",
        "</a>\n",
        "\n",
        "<a href=\"https://twitter.com/intent/tweet?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/search/vertexai-search-options/vertex_ai_search_website_summary.ipynb\" target=\"_blank\">\n",
        "  <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/5/5a/X_icon_2.svg\" alt=\"X logo\">\n",
        "</a>\n",
        "\n",
        "<a href=\"https://reddit.com/submit?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/search/vertexai-search-options/vertex_ai_search_website_summary.ipynb\" target=\"_blank\">\n",
        "  <img width=\"20px\" src=\"https://redditinc.com/hubfs/Reddit%20Inc/Brand/Reddit_Logo.png\" alt=\"Reddit logo\">\n",
        "</a>\n",
        "\n",
        "<a href=\"https://www.facebook.com/sharer/sharer.php?u=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/search/vertexai-search-options/vertex_ai_search_website_summary.ipynb\" target=\"_blank\">\n",
        "  <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/5/51/Facebook_f_logo_%282019%29.svg\" alt=\"Facebook logo\">\n",
        "</a>    "
      ]
    },
    {
      "cell_type": "markdown",
      "id": "51148404-3cff-4d11-af8d-29200e604b26",
      "metadata": {
        "id": "c0f3f355347e"
      },
      "source": [
        "| | |\n",
        "|-|-|\n",
        "|Author | [Neeraj Shivhare](https://github.com/nshivhar) |"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "ab0e6437-898f-41f4-9006-c3d3e341704a",
      "metadata": {
        "id": "2e673e771345"
      },
      "source": [
        "## Objective\n",
        "\n",
        "The main goal of this code is to provide a way to query a website data store in Vertex AI Search, retrieve the most relevant webpage, and extract and summarize its content. This can be used to build a question-answering system or to simply retrieve and present information from a website in a concise manner.\n",
        "\n",
        "### Key Features\n",
        "- Vertex AI Search Integration: Utilizes the Discovery Engine API to query a website data store in Vertex AI Search.\n",
        "- Top Result Retrieval: Selects the first (presumably most relevant) URL from the search results.\n",
        "- Webpage Content Extraction: Fetches the webpage content using requests and extracts relevant information (title, description, page content) using BeautifulSoup.\n",
        "- Gemini 2.0 Summarization: Using Gemini 2.0 to summarize the extracted page content. This would involve sending the page_content to the Gemini API for summarization.\n",
        "\n",
        "\n",
        "## How to use the notebook\n",
        "- Initialization: Initialize the notebook by providing your `project_id` and `data_store_id`.\n",
        "- Search: Call the `get_page_contents` method with your `search_query`. This method will:\n",
        "     1. Perform the search using Vertex AI Search.\n",
        "     2. Extract the first link from the results.\n",
        "     3. Fetch and format the content from the link.\n",
        "- Print the formatted document details (title, source, description, page content).\n",
        "- Summarize the content using Gemini 2.0 (not shown in the code)."
      ]
    },
    {
      "cell_type": "markdown",
      "id": "a9caaa86-fa5f-4b38-bbad-8d3209eac12a",
      "metadata": {
        "id": "628e815b6b1f"
      },
      "source": [
        "## Getting Started\n",
        "\n",
        "### Install Vertex AI SDK for Python"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "75a04e1e-03dc-421d-a8eb-ad53a41baa64",
      "metadata": {
        "id": "3f035428fdba"
      },
      "outputs": [],
      "source": [
        "%pip install google-cloud-discoveryengine==0.12.1 langchain_google_vertexai"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "793c2ef4-8010-4a69-86f1-4ad851b33c07",
      "metadata": {
        "id": "2d4000d88ad8"
      },
      "source": [
        "### Restart kernel"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 1,
      "id": "e3d8bf53-93a1-4706-8edc-c14a7486c3f5",
      "metadata": {
        "id": "0c5492fd0156"
      },
      "outputs": [],
      "source": [
        "# Restart kernel after installs so that your environment can access the new packages\n",
        "import IPython\n",
        "\n",
        "app = IPython.Application.instance()\n",
        "app.kernel.do_shutdown(True)"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "d61c331b-1f44-4a62-9529-d2755d9c4fae",
      "metadata": {
        "id": "6582b5d47c28"
      },
      "source": [
        "### Authenticate your notebook environment (Colab only)\n",
        "\n",
        "If you are running this notebook on Google Colab, run the following cell to authenticate your environment. This step is not required if you are using [Vertex AI Workbench](https://cloud.google.com/vertex-ai-notebooks?hl=en)."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 1,
      "id": "c7aa2fc1-33d4-4b48-b02e-4736d13fa577",
      "metadata": {
        "id": "4788c6f28f01"
      },
      "outputs": [],
      "source": [
        "import sys\n",
        "\n",
        "# Additional authentication is required for Google Colab\n",
        "if \"google.colab\" in sys.modules:\n",
        "    # Authenticate user to Google Cloud\n",
        "    from google.colab import auth\n",
        "\n",
        "    auth.authenticate_user()"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "9e33e9ef-0617-4ff9-b03f-44157a9e3384",
      "metadata": {
        "id": "9ccc6635848a"
      },
      "source": [
        "### imports"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 1,
      "id": "f606c3df-f9b6-429c-a05c-aac5212f1605",
      "metadata": {
        "id": "9b3d8b3cfab5"
      },
      "outputs": [],
      "source": [
        "import logging\n",
        "from typing import TypeVar\n",
        "\n",
        "from bs4 import BeautifulSoup\n",
        "from google.api_core.client_options import ClientOptions\n",
        "from google.cloud import discoveryengine_v1beta as discoveryengine\n",
        "from google.protobuf import json_format\n",
        "from langchain_core.prompts import PromptTemplate\n",
        "from langchain_google_vertexai import VertexAI\n",
        "import requests\n",
        "\n",
        "Output_co = TypeVar(\"Output_co\", covariant=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "93617401-cac3-47c5-9e4a-b57081986a78",
      "metadata": {
        "id": "51d2b9ba060f"
      },
      "source": [
        "### Initialization"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "6480c516-7196-4da3-8e7d-3d06d33e9a98",
      "metadata": {
        "id": "00f9e929ba8c"
      },
      "outputs": [],
      "source": [
        "# Use the environment variable if the user doesn't provide Project ID.\n",
        "import os\n",
        "\n",
        "import vertexai\n",
        "\n",
        "PROJECT_ID = \"[your-project-id]\"  # @param {type: \"string\", placeholder: \"[your-project-id]\" isTemplate: true}\n",
        "if not PROJECT_ID or PROJECT_ID == \"[your-project-id]\":\n",
        "    PROJECT_ID = str(os.environ.get(\"GOOGLE_CLOUD_PROJECT\"))\n",
        "\n",
        "LOCATION = os.environ.get(\"GOOGLE_CLOUD_REGION\", \"us-central1\")\n",
        "\n",
        "vertexai.init(project=PROJECT_ID, location=LOCATION)\n",
        "!gcloud config set project {project_id}\n",
        "\n",
        "DATA_STORE_LOCATION = \"global\"  # @param {type: \"string\"}\n",
        "DATA_STORE_ID = \"your_web_datastore_id\"  # @param {type: \"string\"}\n",
        "\n",
        "logging.basicConfig(level=logging.INFO)\n",
        "logger = logging.getLogger(__name__)"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "f11da7ed-37a2-466b-9607-af11a2511dd5",
      "metadata": {
        "id": "a40e6c6f0faf"
      },
      "source": [
        "### Initialize the Discovery Engine client"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 4,
      "id": "5e3baab5-9ba6-4608-9297-4d04aa683618",
      "metadata": {
        "id": "2e404af887cf"
      },
      "outputs": [],
      "source": [
        "logger.info(\"Initializing Discovery Engine client...\")\n",
        "client_options = (\n",
        "    ClientOptions(api_endpoint=f\"{LOCATION}-discoveryengine.googleapis.com\")\n",
        "    if LOCATION != \"global\"\n",
        "    else None\n",
        ")\n",
        "client = discoveryengine.SearchServiceClient(client_options=client_options)"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "030c31a9-6003-449b-bb8c-9dd735a70093",
      "metadata": {
        "id": "2ca07ed2fdea"
      },
      "source": [
        "### Search the data store using the Google Cloud Discovery Engine API"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 5,
      "id": "15597eb5-9c72-4dee-b9bb-ea0c6b28773b",
      "metadata": {
        "id": "23dec5882d8e"
      },
      "outputs": [],
      "source": [
        "def get_relevant_snippets(search_query: str) -> None | (discoveryengine.SearchResponse):\n",
        "    \"\"\"\n",
        "    Search the data store using the Google Cloud Discovery Engine API.\n",
        "\n",
        "    Args:\n",
        "        search_query (str): The search query string.\n",
        "\n",
        "    Returns:\n",
        "        Optional[discoveryengine.SearchResponse]:\n",
        "        The search response from the Discovery Engine API.\n",
        "    \"\"\"\n",
        "    logger.info(\"Searching data store with query: %s\", search_query)\n",
        "    try:\n",
        "        serving_config = client.serving_config_path(\n",
        "            project=PROJECT_ID,\n",
        "            location=LOCATION,\n",
        "            data_store=DATA_STORE_ID,\n",
        "            serving_config=\"default_config\",\n",
        "        )\n",
        "\n",
        "        content_search_spec = discoveryengine.SearchRequest.ContentSearchSpec(\n",
        "            snippet_spec=discoveryengine.SearchRequest.ContentSearchSpec.SnippetSpec(\n",
        "                return_snippet=True\n",
        "            )\n",
        "        )\n",
        "\n",
        "        request = discoveryengine.SearchRequest(\n",
        "            serving_config=serving_config,\n",
        "            query=search_query,\n",
        "            page_size=5,\n",
        "            content_search_spec=content_search_spec,\n",
        "            query_expansion_spec=discoveryengine.SearchRequest.QueryExpansionSpec(\n",
        "                condition=discoveryengine.SearchRequest.QueryExpansionSpec.Condition.AUTO,\n",
        "            ),\n",
        "            spell_correction_spec=discoveryengine.SearchRequest.SpellCorrectionSpec(\n",
        "                mode=discoveryengine.SearchRequest.SpellCorrectionSpec.Mode.AUTO\n",
        "            ),\n",
        "        )\n",
        "\n",
        "        response = client.search(request)\n",
        "        logger.info(\"Search successful.\")\n",
        "        return response\n",
        "\n",
        "    except Exception as e:  # pylint: disable=broad-exception-caught\n",
        "        logger.error(\"Error during data store search: %s\", e)\n",
        "        return None"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "aec3a92f-916b-4b29-b45a-780b31b7a655",
      "metadata": {
        "id": "ac0a77f6fb55"
      },
      "source": [
        "### Extracts the first link from the top search response."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 6,
      "id": "7bd1a8cb-9e05-4204-805a-61cd7dabaa21",
      "metadata": {
        "id": "1908e84e6489"
      },
      "outputs": [],
      "source": [
        "def get_first_link(response: discoveryengine.SearchResponse | None) -> str | None:\n",
        "    \"\"\"\n",
        "    Extracts the first link from the search response.\n",
        "\n",
        "    Args:\n",
        "        response (Optional[discoveryengine.SearchResponse]):\n",
        "          The search response object from the Discovery Engine API.\n",
        "\n",
        "    Returns:\n",
        "        Optional[str]: The first link extracted from the search results.\n",
        "    \"\"\"\n",
        "    logger.info(\"Extracting first link from search response...\")\n",
        "    if response is None or not response.results:\n",
        "        logger.error(\"No results found or empty response.\")\n",
        "        return None\n",
        "\n",
        "    try:\n",
        "        first_result = response.results[0]\n",
        "        result_json = json_format.MessageToDict(\n",
        "            first_result.document._pb  # pylint: disable=protected-access\n",
        "        )\n",
        "        derived_struct_data = result_json.get(\"derivedStructData\", {})\n",
        "        link = derived_struct_data.get(\"link\", None)\n",
        "        logger.info(\"First link extracted successfully: %s\", link)\n",
        "        return link\n",
        "    except Exception as e:  # pylint: disable=broad-exception-caught\n",
        "        logger.error(\"Error extracting link from results: %s\", e)\n",
        "        return None"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "74d7b43e-4a14-4a1b-9de6-b132f6313d8f",
      "metadata": {
        "id": "0905b7c8fb99"
      },
      "source": [
        "### Loads and formats the full text from the given link using requests and BeautifulSoup."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 7,
      "id": "39b5e9d5-5f0e-4b60-88e6-b3f66f83e26a",
      "metadata": {
        "id": "1885380243bb"
      },
      "outputs": [],
      "source": [
        "def load_and_format_page_content(link: str) -> dict[str, str] | None:\n",
        "    \"\"\"\n",
        "    Loads and formats the full text from the given link using requests and\n",
        "     BeautifulSoup.\n",
        "\n",
        "    Args:\n",
        "        link (str): The URL to fetch and extract the content from.\n",
        "\n",
        "    Returns:\n",
        "        Optional[Dict[str, str]]:\n",
        "        A dictionary with formatted source,\n",
        "          title, description,\n",
        "          and page content.\n",
        "    \"\"\"\n",
        "    logger.info(\"Loading and formatting page content from: %s\", link)\n",
        "    try:\n",
        "        response = requests.get(link)\n",
        "        response.raise_for_status()  # Ensure we notice bad responses\n",
        "\n",
        "        soup = BeautifulSoup(response.text, \"html.parser\")\n",
        "\n",
        "        # Extract title, source, description, and page content\n",
        "        title = soup.title.string.strip() if soup.title else \"No title available\"\n",
        "        source = link\n",
        "        description_meta = soup.find(\"meta\", {\"name\": \"description\"})\n",
        "        description = (\n",
        "            description_meta[\"content\"]\n",
        "            if description_meta\n",
        "            else \"No description available\"\n",
        "        )\n",
        "        page_content = \" \".join(p.get_text() for p in soup.find_all(\"p\"))\n",
        "\n",
        "        logger.info(\"Page content loaded and formatted successfully.\")\n",
        "        return {\n",
        "            \"source\": source,\n",
        "            \"title\": title,\n",
        "            \"description\": description,\n",
        "            \"page_content\": page_content,\n",
        "        }\n",
        "    except Exception as e:  # pylint: disable=broad-exception-caught\n",
        "        logger.error(\"Error loading content from link: %s\", e)\n",
        "        return None"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "15582e7a-251c-4fad-91c6-6f4b6e780532",
      "metadata": {
        "id": "551c725708e5"
      },
      "source": [
        "### Performs a search, extracts the first link, and retrieves and formats the full text from the link."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 8,
      "id": "733df3e7-74fc-4bcd-81d9-5a8cc191469c",
      "metadata": {
        "id": "c963366b9a48"
      },
      "outputs": [],
      "source": [
        "def get_page_contents(search_query: str) -> str | None:\n",
        "    \"\"\"\n",
        "    Performs a search, extracts the first link,\n",
        "      and retrieves and formats the full text from the link.\n",
        "\n",
        "    Args:\n",
        "        search_query (str): The search query string.\n",
        "\n",
        "    Returns:\n",
        "        Optional[str]:\n",
        "        The full text extracted from the first link of the search results.\n",
        "    \"\"\"\n",
        "    logger.info(\"Getting page contents for query: %s\", search_query)\n",
        "    response = get_relevant_snippets(search_query)\n",
        "    link = get_first_link(response)\n",
        "    if link:\n",
        "        details = load_and_format_page_content(link)\n",
        "        logger.info(\"Page contents retrieved successfully.\")\n",
        "        return details if details else None\n",
        "    logger.warning(\"No link found for the query.\")\n",
        "    return None"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "bf4769fc-5e7c-4f68-8202-7d50b0a3f2a5",
      "metadata": {
        "id": "01ad085708b7"
      },
      "source": [
        "#### Prompt for response summarization"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 9,
      "id": "32efb3dc-6108-4c33-8604-02fda882e461",
      "metadata": {
        "id": "4ba87432ee5c"
      },
      "outputs": [],
      "source": [
        "WEBPAGE_EXTRACTION_TEMPLATE = \"\"\" You are a helpful and informative Q&A bot. A user will provide you with text content from a web page and ask questions related to it. \n",
        "Your task is to analyze the content and answer the user's questions accurately and concisely. \n",
        "\n",
        "Here's how you should approach each request:\n",
        "\n",
        "1. Thoroughly read the provided web page content.\n",
        "2. Understand the user's question.\n",
        "3. Identify the relevant information within the content.\n",
        "4. Formulate a clear and concise answer based on the content.Whenever possible, use bullet points to summarize the answer.\n",
        "5. If the answer cannot be found in the content**, say \"I'm sorry, but I cannot find the answer to that question in the provided text.\"\n",
        "\n",
        "Example:\n",
        "\n",
        "User:\n",
        "Here's the content from a web page: {context} \n",
        "My question is: What are the use cases of Vertex AI?\n",
        "\n",
        "Bot:\n",
        "[Provide a concise answer based on the web page content, Use Markdown and bullet Points where ever applicable. If the answer is not found, say you cannot find it.] \n",
        "\"\"\"\n",
        "WEBPAGE_EXTRACTION_PROMPT = PromptTemplate(\n",
        "    input_variables=[\"context\"], template=WEBPAGE_EXTRACTION_TEMPLATE\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "688dd8ff-da62-4e5c-9de5-40b72d992d79",
      "metadata": {
        "id": "451ba8781bd8"
      },
      "source": [
        "### Create Q&A chain"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 13,
      "id": "dfafed12-9ddd-46e4-8093-d6aa048751f5",
      "metadata": {
        "id": "3dd1545bfbde"
      },
      "outputs": [],
      "source": [
        "def get_chain():\n",
        "    \"\"\"Return a RunnableSerializable Chain\"\"\"\n",
        "    logger.info(\"Building VertexAI chain...\")\n",
        "    search_llm_kwargs = {\"prompt\": WEBPAGE_EXTRACTION_PROMPT}\n",
        "\n",
        "    return VertexAI(\n",
        "        model_name=\"gemini-2.0-flash\",\n",
        "        verbose=False,\n",
        "        search_llm_kwargs=search_llm_kwargs,\n",
        "        return_direct=False,\n",
        "        generation_config={\n",
        "            \"temperature\": 0.2,\n",
        "            \"top_p\": 0.95,\n",
        "            \"top_k\": 10,\n",
        "            \"max_output_tokens\": 4000,\n",
        "        },\n",
        "    )"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "87155e1b-c9d1-4b47-be93-d25aa3d6804d",
      "metadata": {
        "id": "df21ab2864be"
      },
      "source": [
        "### Invoking chain with query"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 14,
      "id": "83373eb9-5505-4ff6-8489-2e0d39989e8d",
      "metadata": {
        "id": "adfab87fc413"
      },
      "outputs": [],
      "source": [
        "def invoke(query: str) -> Output_co:\n",
        "    \"\"\"Invoke chain and return the answer\"\"\"\n",
        "    logger.info(\"Invoking chain with query: %s\", query)\n",
        "    page_content = get_page_contents(query)\n",
        "    if page_content:\n",
        "        logger.info(\"Page content retrieved successfully.\")\n",
        "        chain = get_chain()\n",
        "        formatted_prompt = WEBPAGE_EXTRACTION_PROMPT.format(context=page_content)\n",
        "        response = chain(formatted_prompt)\n",
        "        return response[\"result\"] if \"result\" in response else response\n",
        "    logger.warning(\"No relevant context found to summarize.\")\n",
        "    return \"No relevant context found to summarize.\""
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 15,
      "id": "766b76c6-2a35-41f6-9b01-8ec9f40dccc9",
      "metadata": {
        "id": "d28bcaca4870"
      },
      "outputs": [],
      "source": [
        "search_query = \"What are the benefits of Vertex AI?\"\n",
        "invoke(search_query)"
      ]
    }
  ],
  "metadata": {
    "colab": {
      "name": "vertex_ai_search_website_summary.ipynb",
      "toc_visible": true
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
